Temporal-Textual Retrieval: Time and Keyword Search in Web Documents
نویسندگان
چکیده
As the web ages, many web documents become relevant only to certain time periods, such as web-pages containing news and events or those documenting natural phenomena. Hence, to retrieve the most relevant pages, in addition to providing the relevant keywords, one may desire to identify the relevant time period(s) as well, e.g., “Barack Obama 1980-1985”. Unfortunately, not much work has been done by industry or academia to support this type of searches. To the best of our knowledge, the only way that some search engines exploit the time information in the user query is to filter out those resulting web pages whose publication/modification time are not within the queried time interval. In this paper, we propose a new indexing and ranking framework for temporal-textual retrieval. The framework leverages the classical vector space model and provides a complete scheme for indexing, query processing and ranking of the temporal-textual queries. We propose a variety of approaches to exploit popular keyword and temporal index structures. We present a novel hybrid index structure which indexes both the temporal and the textual aspects of the documents in a unified, integrated manner. We also study how to rank documents by seamlessly combining their temporal and textual features. We develop a new scoring schema called temporal tf-idf to compute the temporal relevance of a document to a query, and we combine this score with the textual relevance to compute the overall relevance score of the document to the query. We present both a cost model analysis and an extensive set of experiments over real-world datasets (New York Times Annotated Corpus and Freebase) to evaluate the proposed framework and demonstrate its efficiency and effectiveness.
منابع مشابه
An Effective Path-aware Approach for Keyword Search over Data Graphs
Abstract—Keyword Search is known as a user-friendly alternative for structured languages to retrieve information from graph-structured data. Efficient retrieving of relevant answers to a keyword query and effective ranking of these answers according to their relevance are two main challenges in the keyword search over graph-structured data. In this paper, a novel scoring function is proposed, w...
متن کاملTiwiki: Searching Wikipedia with Temporal Constraints
Temporal information retrieval received a lot of attention during the last years and it is, in the meantime, widely accepted in the IR community that temporal information needs are important to tackle. A particular type of temporal queries are those with explicit temporal constraints, which make almost 15% of today’s Web search queries. Although several approaches to allow textual search combin...
متن کاملRelation Inclusive Search for Hindi Documents
Information retrieval (IR) techniques become a challenge to researchers due to huge growth of digital and information retrieval. As a wide variety of Hindi Data and Literature is now available on web, we have developed information retrieval system for Hindi documents. This paper presents a new searching technique that has promising results in terms of F-measure. Historically, there have been tw...
متن کاملWeb Page Structure Enhanced Feature Selection for Classification of Web Pages
Web page classification is achieved using text classification techniques. Web page classification is different from traditional text classification due to additional information, provided by web page structure which provides much information on content importance. HTML tags provide visual web page representation and can be considered a parameter to highlight content importance. Textual keywords...
متن کاملIntegrating Keywords and Semantics on Document Annotation and Search
This paper describes GoNTogle, a framework for document annotation and retrieval, built on top of Semantic Web and IR technologies. GoNTogle supports ontology-based annotation for documents of several formats, in a fully collaborative environment. It provides both manual and automatic annotation mechanisms. Automatic annotation is based on a learning method that exploits user annotation history...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- IJNGC
دوره 3 شماره
صفحات -
تاریخ انتشار 2012